Our question involves disparity in the care for the elderly based on income and racial distrbutrions in California. We will be using spatial analysis to determine if there is a spatial pattern based on the dispersion (or lackthereof) of care facilities based on their bed size/capacity size.
We will be testing the hypothesis that facility capacity is correlated with income and race. There is also reason to believe that location matters: areas with neighboring low-income, POC populations will correspond with less bed sizes.
To test this, we will need to test if there is a spatial pattern, and then carry on to run some spatial regression models.
We will start with two basic datasets provided by Jen on the elderly/adult care facilities in CA.
Things are looking a little messy, so let’s clean up the dataset so we only keep the variables of interest.
adult.prep <- adult.res %>%
select(Type, Name, Administrator, Address, City, State, Zip, `County Name`, Capacity, `License First Date`, Status) %>%
filter(Type == "ADULT RESIDENTIAL" & Status == "LICENSED")
elderly.prep <- elderly.res %>%
select(Type, Name, Administrator, Address, City, State, Zip, `County Name`, Capacity, `License First Date`, Status) %>%
filter(Type == "RESIDENTIAL CARE ELDERLY" & Status == "LICENSED")
#Let's join the two
care.facilities <- adult.prep %>%
full_join(elderly.prep)
write_csv(care.facilities, "facilities.prepped.csv")
Now we can just load the prepped csv, and begin working on it. * First, we want to get combine the address, city, state, zip into one address * Then we can send that to google maps api to get the geo coordinates for mapping
Adding in census data.
census_api_key(Sys.getenv("CENSUS_API_KEY"))
care.facilities <- read_csv("care.fac.geo.csv")
#getting data based county
ca.income.county <- get_acs(geography = "county", variables = "B19013_001", state = "CA", geometry = TRUE)
Mapping
library(leaflet)
library(stringr)
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3
library(viridis)
## Loading required package: viridisLite
library(viridisLite)
care.facilities <- read_csv("care.fac.geo.csv")
## Parsed with column specification:
## cols(
## Type = col_character(),
## Name = col_character(),
## Administrator = col_character(),
## Address = col_character(),
## City = col_character(),
## State = col_character(),
## Zip = col_integer(),
## County.Name = col_character(),
## Capacity = col_integer(),
## License.First.Date = col_character(),
## Status = col_character(),
## Location = col_character(),
## Lat = col_double(),
## Long = col_double()
## )
#getting data on median income, census tract
ca.income <- get_acs(geography = "tract", variables = "B19013_001", state = "CA", geometry = TRUE)
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
#setting color
pal <- colorNumeric(palette = "viridis",
domain = ca.income$estimate)
getColor <- function(care.facilities) {
sapply(care.facilities$Capacity, function(Capacity) {
if(Capacity <= 12) {
"red"
} else if(Capacity >= 13 & Capacity <= 24) {
"orange"
} else if(Capacity >= 25 & Capacity <=100) {
"yellow"
} else {
"green"
} } )
}
ca.income %>%
st_transform(crs = "+init=epsg:4326") %>%
leaflet(width = "100%") %>%
addTiles() %>%
addPolygons(stroke= FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(estimate)) %>%
addCircles(data = care.facilities,
lat = ~Lat,
lng = ~Long,
label = ~as.character(Address),
weight = 2,
radius = 20,
stroke = TRUE,
fillOpacity = 0.8,
color = getColor(care.facilities)
) %>%
addLegend("bottomright",
pal = pal,
values = ~ estimate,
title = "Median Home Value",
labFormat = labelFormat(prefix = "$"),
opacity = 1) %>%
addLegend("bottomleft",
colors= c("Red", "Orange", "Yellow", "Green"),
labels = c("4-12 Beds", "13-24 Beds", "25-100 Beds", "101+ Beds"),
title = "Capacity Size")
## Warning in validateCoords(lng, lat, funcName): Data contains 433 rows with
## either missing or invalid lat/lon values and will be ignored
Okay so for the most part, I figured it out, no idea why there are some corrdinates in like Michigan. Time to aggregate data for regression analysis.
What I’m thinking is we can try to map all the points and see what it looks like first. Using circle radius perhaps? Checklist of things:
#clean up data, 3 values that are not in California